Job seekers could find appropriate words to describe their skills and experience base on companies’ descriptions of job vacancies. To give a clear display of description, we extracted relevant data from Job Board - Adzuna. From the data provided by Adzuna, we could roughly know the location distribution of job vacancy, the frequency of each vacancy’s label. Besides, According to simple text analysis, we could find which words are frequently mentioned words.
Due to the API usage limitation, we used peers’ accounts to extract the data. There were 4 data frames after performing the loop; we combined these data frames and deleted duplicate observations. To simplify the processing data progress, I saved the final data frame as a “raw_data” Rfile.
Based on the doughnut chart, almost half of the job vacancies related to data and statistics have “IT Jobs” labels(49.2%), followed by the “Scientific Job & QA Jobs”, which accounts for 38.4% of the total. “Healthcare & Nursing Jobs”, “Engineering Jobs” and “Admin Jobs” have similar percentages of about 2.6%.
We used package Leaflet to draw an interactive map marked with popups. Click the popups, and you can see the specific information of job vacancies(company, label, title, and job description). Besides, based on the distribution of popups, we know that job vacancies are relatively densely distributed in the US’s mid-east region, especially on the east coast.
Based on the frequency of words used in vacancies’ description. “Business” and “clinical” are highly mentioned in the description, probably representing job seekers with business and clinical background have more opportunities to meet companies’ requirements. Besides, “lead” is a high-frequency word in the vacancies’ description, which might represent that applicants with lead experience would be preferred.
For the Wordcloud figure on vacancies’ title. “Biostatistics” is a hot word in statistical job vacancies as well. The frequently used word “health” might represent a similar trend.
From this project, we could tell those job seekers with business, clinical and biostatistics backgrounds would be more in line with vacancies’ descriptions. Almost 90% of vacancies’ label would be “IT” or “Scientific & QA”. Besides, there are more job vacancies related to data and statistics in the mid-east region of the US.
1.Jeroen Ooms (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805 [stat.CO] URL https://arxiv.org/abs/1403.2805. 2.Dawei Lang and Guan-tin Chien (2018). wordcloud2: Create Word Cloud by ‘htmlwidget’. R package version 0.2.1. https://CRAN.R-project.org/package=wordcloud2 3.Joe Cheng, Bhaskar Karambelkar and Yihui Xie (2019). leaflet: Create Interactive Web Maps with the JavaScript ‘Leaflet’ Library. R package version 2.0.3. https://CRAN.R-project.org/package=leaflet 4.Silge J, Robinson D (2016). “tidytext: Text Mining and Analysis Using Tidy Data Principles in R.”JOSS,1(3). doi:10.21105/joss.00037,http://dx.doi.org/10.21105/joss.00037. 5.https://blog.csdn.net/jiyang_1/article/details/72179417 6.https://rstudio.github.io/leaflet/markers.html